Boosted Learning in Dynamic Bayesian Networks for Multimodal Speaker Detection
نویسندگان
چکیده
Bayesian network models provide an attractive framework for multimodal sensor fusion. They combine an intuitive graphical representation with efficient algorithms for inference and learning. However, the unsupervised nature of standard parameter learning algorithms for Bayesian networks can lead to poor performance in classification tasks. We have developed a supervised learning framework for Bayesian networks, which is based on the Adaboost algorithm of Schapire and Freund. Our framework covers static and dynamic Bayesian networks with both discrete and continuous states. We have tested our framework in the context of a novel multimodal HCI application: a speech-based command and control interface for a Smart Kiosk. We provide experimental evidence for the utility of our boosted learning approach.
منابع مشابه
Boosting and Structure Learning in Dynamic Bayesian Networks for Audio-Visual Speaker Detection
Bayesian networks are an attractive modeling tool for human sensing, as they combine an intuitive graphical representation with ef£cient algorithms for inference and learning. Earlier work has demonstrated that boosted parameter learning could be used to improve the performance of Bayesian network classi£ers for complex multi-modal inference problems such as speaker detection. In speaker detect...
متن کاملMultimodal Speaker Detection using Error Feedback Dynamic Bayesian Networks
Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...
متن کاملMultimodal Speaker Detection Using Error Feedback Dynamic Bayesian Networks
Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allow the power of statistical inference and learning to be com...
متن کاملMultimodal Speaker Detection Using Input/Output Dynamic Bayesian Networks
Inferring users’ actions and intentions forms an integral part of design and development of any human-computer interface. The presence of noisy and at times ambiguous sensory data makes this problem challenging. We formulate a framework for temporal fusion of multiple sensors using input–output dynamic Bayesian networks (IODBNs). We find that contextual information about the state of the comput...
متن کاملFloor holder detection and end of speaker turn prediction in meetings
We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001